Undoubtedly, we’ve all heard about “flattening the curve”: the goal of slowing the rate of new COVID-19 infections by limiting person-to-person contacts. A related question is what the curve even looks like right now. Is it accelerating or slowing down? When is the peak expected to occur? Since data collection became more systematic, we are in a position to view these questions through the lens of values observed so far.
Below, I apply simple curve fitting to model new COVID-19 cases in the U.S. over time. The data is aggregated from three different sources (see below). I chose to fit a log-normal distribution, which is traditionally used by epidemilogists to study cases by date of onset. After fitting the curve, I use it to plot projections for one week into the future.
The plot is interactive and best viewed on a computer, where you can hover a mouse pointer over individual points, as well as click and drag to pan and zoom around. The functionality is more limited on a mobile screen and varies from device to device. On some devices, the plotting area prevents you from scrolling with your finger; swipe along the edges of the plot to scroll instead.
Since the beginning of April, there’s been a distinct periodicity in case reports. In particular, we can observe spikes in the number of cases reported every Friday over the past several weeks. This likely indicates that human activity settled into a weekly routine, as everybody adjusted to the “new normal” after the initial spread of the virus.
The observed data is aggregated across three data sources: John Hopkins University (JHU), The COVID Tracking Project (CTP), and The New York Times (NYT). The reason for aggregation is to smooth out small discrepancies in reporting. For example, in the plot below you may notice that the number of cases was under-reported for Mar 18th and over-reported for Mar 19th by JHU, relative to the other two data sources. (This is likely due to time zone differences.) To reduce the effect of such artifacts, the curve is fit to the median values computed for each date.